Caio Raphael

In classical compiler design, a lexer’s job is to convert a flat character stream into a flat token stream. It does not understand nesting or structure beyond very local patterns.
In the standard model, the lexer is intentionally simple, and understanding nesting is the parser’s job.
From a theoretical lexer design standpoint, the key rule is:
A lexer should encode only what is unambiguous at the lexical level, and nothing that depends on grammar or semantics.
A lexer must never build language constructs whose shape depends on syntax or meaning, even if the meaning is “known”.
- For example, suppose Vector2(20, 30) : knowing that Vector2 is “just two f32s” is semantic knowledge, not lexical knowledge.

Examples

Example 1

[node name="flor" parent="." unique_id=2138173886 instance=ExtResource("3_7sc02")]

LBRACKET
IDENT(node)
IDENT(name)
EQUAL
STRING("flor")
IDENT(parent)
EQUAL
STRING(".")
IDENT(unique_id)
EQUAL
NUMBER(2138173886)
IDENT(instance)
EQUAL
IDENT(ExtResource)
LPAREN
STRING("3_7sc02")
RPAREN
RBRACKET

Example 2

Vector2(506, 323)
must be tokenized into multiple tokens, not one.
Each token independently fits your { type, value } model.
Exact theoretical token sequence
Token
- type: IDENTIFIER
- value: "Vector2"
Token
- type: LEFT_PAREN
- value: ( or None
Token
- type: NUMBER_LITERAL
- value: 506 (numeric value, not string)
Token
- type: COMMA
- value: , or None
Token
- type: NUMBER_LITERAL
- value: 323 (numeric value, not string)
Token
- type: RIGHT_PAREN
- value: ) or None
That is the full and correct lexical output.
Parser responsibility (high level):
- The parser’s job is to:
  - Consume tokens according to grammar rules
  - Establish structure and relationships(2025-12-25)
  - Produce an AST, not values
- The parser does not:
  - Decide what Vector2 means
  - Construct arrays
  - Convert to f32
  - Perform semantic validation
Where values are manipulated
- Values are first legitimately created, evaluated, and manipulated during semantic analysis / constant evaluation, after parsing but before code generation.
- Semantic analysis / constant evaluation / lowering
  - This phase may have different names, but conceptually it is where:
    - Symbols are resolved
    - Types are assigned
    - Expressions may be evaluated
    - Constants may be folded
    - Builtins may be lowered
    - IR-friendly representations are produced
  - This is the first phase allowed to manipulate actual values.